Pattern Matching in Trees and Strings

نویسنده

  • Philip Bille
چکیده

We study the design of efficient algorithms for combinatorial pattern matching. More concretely, we study algorithms for tree matching, string matching, and string matching in compressed texts. Tree Matching Survey We begin with a survey on tree matching problems for labeled trees based on deleting, inserting, and relabeling nodes. We review the known results for the tree edit distance problem, the tree alignment distance problem, and the tree inclusion problem. The survey covers both ordered and unordered trees. For each of the problems we present one or more of the central algorithms for each of the problems in detail. Tree Inclusion Given rooted, ordered, and labeled trees P and T the tree inclusion problem is to determine if P can be obtained from T by deleting nodes in T . We show that the tree inclusion problem can be solved in O(nT ) space with the following running times: min    O(lPnT ), O(nP lT log lognT + nT ), O( nP nT lognT + nT lognT ). Here nS and lS denotes the number of nodes and leaves in tree S ∈ {P, T }, respectively, and we assume that nP ≤ nT . Our results matches or improves the previous time complexities while using only O(nT ) space. All previous algorithms required Ω(nPnT ) space in worst-case. Tree Path Subsequence Given rooted and labeled trees P and T the tree path subsequence problem is to report which paths in P are subsequences of which paths in T . Here a path begins at the root and ends at a leaf. We show that the tree path subsequence problem can be solved in O(nT ) space with the following running times: min    O(lPnT + nP ), O(nP lT + nT ), O( nP nT lognT + nT + nP lognP ). As our results for the tree inclusion problem this matches or improves the previous time complexities while using only O(nT ) space. All previous algorithms required Ω(nPnT ) space in worst-case. Regular Expression Matching Using the Four Russian Technique Given a regular expressionR and a string Q the regular expression matching problem is to determine if Q matches any of the strings specified by R. We give an algorithm for regular expression matching using O(nm/ logn + n + m logm) and O(n) space, where m and n are the lengths of R and Q, respectively. This matches the running time of the fastest known algorithm for the problem while improving the space from O(nm/ logn) to O(n). Our algorithm is based on the Four Russian Technique. We extend our ideas to improve the results for the approximate regular expression matching problem, the string edit distance problem, and the subsequence indexing problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parameterized matching on non-linear structures

The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set Σ. In the parameterized pattern matching model, a consistent renaming of symbols from Σ is allowed in a match. The parameterized matching paradigm has proven useful in problems in software engineering, computer vision, and other applications. In clas...

متن کامل

Indexes for Jumbled Pattern Matching in Strings, Trees and Graphs

We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.

متن کامل

Binary Jumbled Pattern Matching via All-Pairs Shortest Paths

In binary jumbled pattern matching we wish to preprocess a binary string S in order to answer queries (i, j) which ask for a substring of S that is of size i and has exactly j 1-bits. The problem naturally generalizes to node-labeled trees and graphs by replacing “substring” with “connected subgraph”. In this paper, we give an n/2 n/ log log n) 1/2 time solution for both strings and trees. This...

متن کامل

Algorithmics on SLP-compressed strings: A survey

Results on algorithmic problems on strings that are given in a compressed form via straightline programs are surveyed. A straight-line program is a context-free grammar that generates exactly one string. In this way, exponential compression rates can be achieved. Among others, we study pattern matching for compressed strings, membership problems for compressed strings in various kinds of formal...

متن کامل

A Novel Data Structure for String Matching Applicable in Network Processing

We address prefix matching problems which constitute the building block of some applications in the computer realm and related area. It is assumed there are strings of an alphabet Σ which are ordered. The data strings can have different lengths and some of them can be prefixes of others. A well known application of prefix matching is layer 3 IP switching in which routers forward an IP packet by...

متن کامل

Abelian pattern matching in strings

Abelian pattern matching is a new class of pattern matching problems. In abelian patterns, the order of the characters in the substrings does not matter, e.g. the strings abbc and babc represent the same abelian pattern a+2b+c. Therefore, unlike classical pattern matching, we do not look for an exact (ordered) occurrence of a substring, rather the aim here is to find any permutation of a given ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/0708.4288  شماره 

صفحات  -

تاریخ انتشار 2007